How Big Hadoop Clusters Break in the Real World

نویسندگان

  • Ariel Rabkin
  • Randy Katz
چکیده

Hadoop is among today’s most widely deployed “big data” systems. Cloudera is a company offering paid Hadoop services and support. This poster abstract describes lessons from examining a sample of 293 support tickets, from February through July of 2011. We manually labelled the tickets in our sample with the established root cause and the specific system component being worked on. Tickets cover not only the core Hadoop filesystem and MapReduce implementation, but other services, such as HBase, a BigTable clone, and the Zookeeper coordination service.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing gazetteers from volunteered Big Geo-Data based on Hadoop

Traditional gazetteers are built and maintained by authoritative mapping agencies. In the age of Big Data, it is possible to construct gazetteers in a data-driven approach by mining rich volunteered geographic information (VGI) from the Web. In this research, we build a scalable distributed platform and a high-performance geoprocessing workflow based on the Hadoop ecosystem to harvest crowd-sou...

متن کامل

Impact of Big Data: Networking Considerations and Case Study

Due to the explosive growth of data volume by mobile devices and SNS(Social Networking Service), Big Data has recently become one of the important issues in the networking world. Big traffic is generated as Big Data processing steps and multiple regionally distributed data centers are included, and/or data are delivered among clusters for the purpose of storage hierarchy management. Therefore, ...

متن کامل

Analysing Distributed Big Data through Hadoop Map Reduce

This term paper focuses on how the big data is analysed in a distributed environment through Hadoop Map Reduce. Big Data is same as “small data” but bigger in size. Thus, it is approached in different ways. Storage of Big Data requires analysing the characteristics of data. It can be processed by the employment of Hadoop Map Reduce. Map Reduce is a programming model working parallel for large c...

متن کامل

Big Data Processing with Hadoop Map-reduce

The amount of data in our world has been exploding, and analyzing large data sets—so-called big data—will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. The increasing volume and detail of information captured by enterprises, the rise of multimedia, social media, and the Internet of Things will fuel exponential growth in data ...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011